On the optimal trimming of high-throughput mRNA sequence data
نویسنده
چکیده
The widespread and rapid adoption of high-throughput sequencing technologies has afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change, and perhaps more importantly, the links between genotype and phenotype. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score <2 or <5, is optimal for most studies across a wide variety of metrics.
منابع مشابه
Throughput Maximization for Multi-Slot Data Transmission via Two-Hop DF SWIPT-Based UAV System
In this paper, an unmanned aerial vehicle (UAV) assisted cooperative communication system is studied, wherein a source transmits information to the destination through an energy harvesting decode-and-forward UAV. It is assumed that the UAV can freely move in between the source-destination pair to set up line of sight communications with the both nodes. Since the battery of the UAV may be limite...
متن کاملLcscanner: an Efficient and Accurate Trimming Tool for Illumina next Generation Sequencing Reads Lcscanner: an Efficient and Accurate Trimming Tool for Illumina next Generation Sequencing Reads
Recent advances in High-Throughput Sequencing (HTS) technology have greatly facilitated the researches in bioinformatics field. With the ultra-high sequencing speed and improved base-calling accuracy, Illumina Genome Analyzer is currently the most widely used platform in the field. To use the raw reads generated from the sequencing machine, the 3’ adapter sequence attached to the real read in t...
متن کاملScanner: An Efficient and Accurate Trimming Tool for Illumina Next Generation Sequencing Reads
Recent advances in High-Throughput Sequencing (HTS) technology have greatly facilitated the researches in bioinformatics field. With the ultra-high sequencing speed and improved base-calling accuracy, Illumina Genome Analyzer is currently the most widely used platform in the field. To use the raw reads generated from the sequencing machine, the 3’ adapter sequence attached to the real read in t...
متن کاملBtrim: A fast, lightweight adapter and quality trimming program for next-generation sequencing technologies
Btrim is a fast and lightweight software to trim adapters and low quality regions in reads from ultra high-throughput next-generation sequencing machines. It also can reliably identify barcodes and assign the reads to the original samples. Based on a modified Myers's bit-vector dynamic programming algorithm, Btrim can handle indels in adapters and barcodes. It removes low quality regions and tr...
متن کاملThroughput Improvement of STS-Based MC DS-CDMA System with Variable Spreading Factor
The throughput enhancement of Space-Time Spreading (STS)-based Multicarrier Direct Sequence Code Division Multiple Access (MC DS-CDMA) system is investigated in this paper. Variable Spreading Factor (VSF) is utilized to improve the data throughput of the system. In this contribution, an analytical approach is proposed to compute a new expression for the Bit Error Rate (BER) performance of t...
متن کامل